Search CORE

FLAVIdB: A data mining system for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology

Author: Brusic Vladimir
Olsen Lars Rønn
Reinherz Ellis L.
Zhang Guang Lan
Publication venue
Publication date: 01/01/2011
Field of study

BACKGROUND: The flavivirus genus is unusually large, comprising more than 70 species, of which more than half are known human pathogens. It includes a set of clinically relevant infectious agents such as dengue, West Nile, yellow fever, and Japanese encephalitis viruses. Although these pathogens have been studied extensively, safe and efficient vaccines lack for the majority of the flaviviruses. RESULTS: We have assembled a database that combines antigenic data of flaviviruses, specialized analysis tools, and workflows for automated complex analyses focusing on applications in immunology and vaccinology. FLAVIdB contains 12,858 entries of flavivirus antigen sequences, 184 verified T-cell epitopes, 201 verified B-cell epitopes, and 4 representative molecular structures of the dengue virus envelope protein. FLAVIdB was assembled by collection, annotation, and integration of data from GenBank, GenPept, UniProt, IEDB, and PDB. The data were subject to extensive quality control (redundancy elimination, error detection, and vocabulary consolidation). Further annotation of selected functionally relevant features was performed by organizing information extracted from the literature. The database was incorporated into a web-accessible data mining system, combining specialized data analysis tools for integrated analysis of relevant data categories (protein sequences, macromolecular structures, and immune epitopes). The data mining system includes tools for variability and conservation analysis, T-cell epitope prediction, and characterization of neutralizing components of B-cell epitopes. FLAVIdB is accessible at cvc.dfci.harvard.edu/flavi/ CONCLUSION: FLAVIdB represents a new generation of databases in which data and tools are integrated into a data mining infrastructures specifically designed to aid rational vaccine design by discovery of vaccine targets

Frontiers - Publisher Connector

Characterizing the human hematopoietic CDome

Author: Barnkob Mike Stein
Olsen Lars Rønn
Simon Christian
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

In this study, we performed extensive semi-automated data collection from the primary and secondary literature in an effort to characterize the expression of all membrane proteins within the CD scheme on hematopoietic cells. Utilizing over 6000 data points across 305 CD molecules on 206 cell types, we seek to give a preliminary characterization of the “human hematopoietic CDome.” We encountered severe gaps in the knowledge of CD protein expression, mostly resulting from incomplete and unstructured data generation, which we argue inhibit both basic research as well as therapies seeking to target membrane proteins. We detail these shortcomings and propose strategies to overcome these issues. Analyzing the available data, we explore the functional characteristics of the CD molecules both individually and across the groups of hematopoietic cells on which they are expressed. We compare protein and mRNA data for a subset of CD molecules, and explore cell functions in the context of CD protein expression. We find that the presence and function of CD molecules serve as good indicators for the overall function of the cells that express them, suggesting that increasing our knowledge about the cellular CDome may serve to stratify cells on a more functional level

University of Southern Denmark Research Output

Harvard University - DASH

Recommended from our members

Conservation Analysis of Dengue Virus T-cell Epitope-Based Vaccine Candidates Using Peptide Block Entropy

Author: Brusic Vladimir
Keskin Derin Benerci
Olsen Lars Rønn
Reinherz Ellis Leonard
Zhang Guang Lan
Publication venue: 'Frontiers Media SA'
Publication date: 05/03/2013
Field of study

Broad coverage of the pathogen population is particularly important when designing CD8+ T-cell epitope vaccines against viral pathogens. Traditional approaches are based on combinations of highly conserved T-cell epitopes. Peptide block entropy analysis is a novel approach for assembling sets of broadly covering antigens. Since T-cell epitopes are recognized as peptides rather than individual residues, this method is based on calculating the information content of blocks of peptides from a multiple sequence alignment of homologous proteins rather than using the information content of individual residues. The block entropy analysis provides broad coverage of variant antigens. We applied the block entropy analysis method to the proteomes of the four serotypes of dengue virus (DENV) and found 1,551 blocks of 9-mer peptides, which cover 99% of available sequences with five or fewer unique peptides. In contrast, the benchmark study by Khan et al. (2008) resulted in 165 conserved 9-mer peptides. Many of the conserved blocks are located consecutively in the proteins. Connecting these blocks resulted in 78 conserved regions. Of the 1551 blocks of 9-mer peptides 110 comprised predicted HLA binder sets. In total, 457 subunit peptides that encompass the diversity of all sequenced DENV strains of which 333 are T-cell epitope candidates

Recommended from our members

Tumor antigens as proteogenomic biomarkers in invasive ductal carcinomas

Author: Brusic Vladimir
Campos Benito
Karger Barry L.
Olsen Lars Rønn
Sgroi Dennis C.
Winther Ole
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: The majority of genetic biomarkers for human cancers are defined by statistical screening of high-throughput genomics data. While a large number of genetic biomarkers have been proposed for diagnostic and prognostic applications, only a small number have been applied in the clinic. Similarly, the use of proteomics methods for the discovery of cancer biomarkers is increasing. The emerging field of proteogenomics seeks to enrich the value of genomics and proteomics approaches by studying the intersection of genomics and proteomics data. This task is challenging due to the complex nature of transcriptional and translation regulatory mechanisms and the disparities between genomic and proteomic data from the same samples. In this study, we have examined tumor antigens as potential biomarkers for breast cancer using genomics and proteomics data from previously reported laser capture microdissected ER+ tumor samples. Results: We applied proteogenomic analyses to study the genetic aberrations of 32 tumor antigens determined in the proteomic data. We found that tumor antigens that are aberrantly expressed at the genetic level and expressed at the protein level, are likely involved in perturbing pathways directly linked to the hallmarks of cancer. The results found by proteogenomic analysis of the 32 tumor antigens studied here, capture largely the same pathway irregularities as those elucidated from large-scale screening of genomics analyses, where several thousands of genes are often found to be perturbed. Conclusion: Tumor antigens are a group of proteins recognized by the cells of the immune system. Specifically, they are recognized in tumor cells where they are present in larger than usual amounts, or are physiochemically altered to a degree at which they no longer resemble native human proteins. This proteogenomic analysis of 32 tumor antigens suggests that tumor antigens have the potential to be highly specific biomarkers for different cancers

Harvard University - DASH

Heidelberger Dokumentenserver

Nazarbayev University Repository

Directory of Open Access Journals

Using microarray-based subtyping methods for breast cancer in the era of high-throughput RNA sequencing

Author: Nielsen Finn Cilius
Olsen Lars Rønn
Pedersen Christina Bligaard
Rossing Maria
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Breast cancer is a highly heterogeneous disease that can be classified into multiple subtypes based on the tumor transcriptome. Most of the subtyping schemes used in clinics today are derived from analyses of microarray data from thousands of different tumors together with clinical data for the patients from which the tumors were isolated. However, RNA sequencing (RNA‐Seq) is gradually replacing microarrays as the preferred transcriptomics platform, and although transcript abundances measured by the two different technologies are largely compatible, subtyping methods developed for probe‐based microarray data are incompatible with RNA‐Seq as input data. Here, we present an RNA‐Seq data processing pipeline, which relies on the mapping of sequencing reads to the probe set target sequences instead of the human reference genome, thereby enabling probe‐based subtyping of breast cancer tumor tissue using sequencing‐based transcriptomics. By analyzing 66 breast cancer tumors for which gene expression was measured using both microarrays and RNA‐Seq, we show that RNA‐Seq data can be directly compared to microarray data using our pipeline. Additionally, we demonstrate that the established subtyping method CITBCMST (Guedj et al., ), which relies on a 375 probe set‐signature to classify samples into the six subtypes basL, lumA, lumB, lumC, mApo, and normL, can be applied without further modifications. This pipeline enables a seamless transition to sequencing‐based transcriptomics for future clinical purposes

Boston University Institutional Repository (OpenBU)

TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes

Author: Brusic Vladimir
Chitkushev Lou
Keskin Derin B.
Olsen Lars Rønn
Zhang Guanglan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

We previously developed TANTIGEN, a comprehensive online database cataloging more than 1000 T cell epitopes and HLA ligands from 292 tumor antigens. In TANTIGEN 2.0, we significantly expanded coverage in both immune response targets (T cell epitopes and HLA ligands) and tumor antigens. It catalogs 4,296 antigen variants from 403 unique tumor antigens and more than 1500 T cell epitopes and HLA ligands. We also included neoantigens, a class of tumor antigens generated through mutations resulting in new amino acid sequences in tumor antigens. TANTIGEN 2.0 contains validated TCR sequences specific for cognate T cell epitopes and tumor antigen gene/mRNA/protein expression information in major human cancers extracted by Human Pathology Atlas. TANTIGEN 2.0 is a rich data resource for tumor antigens and their associated epitopes and neoepitopes. It hosts a set of tailored data analytics tools tightly integrated with the data to form meaningful analysis workflows. It is freely available at http://projects.met-hilab.org/tadb

Nottingham ePrints

Nottingham eTheses

Directory of Open Access Journals

BioReader: a text mining tool for performing classification of biomedical literature

Author: Barnkob Mike Bogetofte
Davidsen Kristian
Hansen Christina
Olsen Lars Rønn
Seymour Emily
Simon Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Abstract Background Scientific data and research results are being published at an unprecedented rate. Many database curators and researchers utilize data and information from the primary literature to populate databases, form hypotheses, or as the basis for analyses or validation of results. These efforts largely rely on manual literature surveys for collection of these data, and while querying the vast amounts of literature using keywords is enabled by repositories such as PubMed, filtering relevant articles from such query results can be a non-trivial and highly time consuming task. Results We here present a tool that enables users to perform classification of scientific literature by text mining-based classification of article abstracts. BioReader (Biomedical Research Article Distiller) is trained by uploading article corpora for two training categories - e.g. one positive and one negative for content of interest - as well as one corpus of abstracts to be classified and/or a search string to query PubMed for articles. The corpora are submitted as lists of PubMed IDs and the abstracts are automatically downloaded from PubMed, preprocessed, and the unclassified corpus is classified using the best performing classification algorithm out of ten implemented algorithms. Conclusion BioReader supports data and information collection by implementing text mining-based classification of primary biomedical literature in a web interface, thus enabling curators and researchers to take advantage of the vast amounts of data and information in the published literature. BioReader outperforms existing tools with similar functionalities and expands the features used for mining literature in database curation efforts. The tool is freely available as a web service at http://www.cbs.dtu.dk/services/BioReade

The USER cloning standard

Author: Bonde Mads Tvillinggaard
Genee Hans Jasper
Hansen Bjarne Gram
Hansen Niels Bjørn
Kaas Christian Schrøder
Matos Cláudia
Olsen Lars Rønn
Publication venue
Publication date: 01/11/2009
Field of study

This BioBricks Foundation Request for Comments (BBF RFC) provides information about the design of uracil-containing primers used for USER cloning and USER fusion

DSpace@MIT

Conservation analysis of dengue virust-cell epitope-based vaccine candidates using peptide block entropy

Author: Brusic Vladimir
Keskin Derin B.
Olsen Lars Rønn
Reinherz Ellis L.
Zhang Guang Lan
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2011
Field of study